868 research outputs found
Informed algorithms for sound source separation in enclosed reverberant environments
While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing.
Initially, a multi-microphone array based method combined with binary
time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is
utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise.
To tackle the under-determined case and further improve separation performance
at higher reverberation times, a two-microphone based method
which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference,
interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned
through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as
the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial
characteristics of the enclosure and further improves the separation performance
in challenging scenarios i.e. when sources are in close proximity and
when the level of reverberation is high.
Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone
reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses
Single channel speech enhancement by colored spectrograms
Speech enhancement concerns the processes required to remove unwanted
background sounds from the target speech to improve its quality and
intelligibility. In this paper, a novel approach for single-channel speech
enhancement is presented, using colored spectrograms. We propose the use of a
deep neural network (DNN) architecture adapted from the pix2pix generative
adversarial network (GAN) and train it over colored spectrograms of speech to
denoise them. After denoising, the colors of spectrograms are translated to
magnitudes of short-time Fourier transform (STFT) using a shallow regression
neural network. These estimated STFT magnitudes are later combined with the
noisy phases to obtain an enhanced speech. The results show an improvement of
almost 0.84 points in the perceptual evaluation of speech quality (PESQ) and 1%
in the short-term objective intelligibility (STOI) over the unprocessed noisy
data. The gain in quality and intelligibility over the unprocessed signal is
almost equal to the gain achieved by the baseline methods used for comparison
with the proposed model, but at a much reduced computational cost. The proposed
solution offers a comparative PESQ score at almost 10 times reduced
computational cost than a similar baseline model that has generated the highest
PESQ score trained on grayscaled spectrograms, while it provides only a 1%
deficit in STOI at 28 times reduced computational cost when compared to another
baseline system based on convolutional neural network-GAN (CNN-GAN) that
produces the most intelligible speech.Comment: 18 pages, 6 figures, 5 table
Nitrous Oxide in Oxygen and Air in Oxygen for Perioperative Analgesia : A Comparative study
Background: To determine that additional dose of nalbuphine is required while using medical air instead of nitrous oxide in oxygen to maintain anaesthesia so that inadequate intra-operative analgesia could be avoided. Methods: This quasi experimental study was carried out in the Department of Anaesthesia, Holy Family Hospital, Rawalpindi, from October 2007 to March 2008. One hundred patients were selected by non probability convenient sampling. Patients between 20 to 40 years of age were included, belonging to ASA Class-I and II. They were divided into two groups (A and B) scheduled for different elective surgical procedures under general anaesthesia. Group A comprised of fifty patients who received medical air in oxygen. Group B comprised of fifty patients who received nitrous oxide in oxygen. The conduct of anaesthesia was kept same in both the groups. Patients heart rate, mean arterial pressure, pulse oximetry, ECG were monitored and requirement of additional dose of nalbuphine in both the groups was noted. Intra-operative tachycardia and hypertension indicated additional dose of nalbuphine. Average value of heart rate and blood pressure of each case was determined and the data compared and analyzed by SPSS-10. Results: Forty patients in group A did not require intra-operative additional nalbuphine while the remaining ten patients required it. Forty eight patients in group B did not require additional intra-operative nalbuphine and only two patients required it. Conclusion: The use of nitrous oxide significantly reduces the intra-operative narcotic analgesia requirement
MaPLe: Multi-modal Prompt Learning
Pre-trained vision-language (V-L) models such as CLIP have shown excellent
generalization ability to downstream tasks. However, they are sensitive to the
choice of input text prompts and require careful selection of prompt templates
to perform well. Inspired by the Natural Language Processing (NLP) literature,
recent CLIP adaptation approaches learn prompts as the textual inputs to
fine-tune CLIP for downstream tasks. We note that using prompting to adapt
representations in a single branch of CLIP (language or vision) is sub-optimal
since it does not allow the flexibility to dynamically adjust both
representation spaces on a downstream task. In this work, we propose
Multi-modal Prompt Learning (MaPLe) for both vision and language branches to
improve alignment between the vision and language representations. Our design
promotes strong coupling between the vision-language prompts to ensure mutual
synergy and discourages learning independent uni-modal solutions. Further, we
learn separate prompts across different early stages to progressively model the
stage-wise feature relationships to allow rich context learning. We evaluate
the effectiveness of our approach on three representative tasks of
generalization to novel classes, new target datasets and unseen domain shifts.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable
performance and achieves an absolute gain of 3.45% on novel classes and 2.72%
on overall harmonic-mean, averaged over 11 diverse image recognition datasets.
Our code and pre-trained models are available at
https://github.com/muzairkhattak/multimodal-prompt-learning.Comment: Accepted at CVPR202
Fine-tuned CLIP Models are Efficient Video Learners
Large-scale multi-modal training with image-text pairs imparts strong
generalization to CLIP model. Since training on a similar scale for videos is
infeasible, recent approaches focus on the effective transfer of image-based
CLIP to the video domain. In this pursuit, new parametric modules are added to
learn temporal information and inter-frame relationships which require
meticulous design efforts. Furthermore, when the resulting models are learned
on videos, they tend to overfit on the given task distribution and lack in
generalization aspect. This begs the following question: How to effectively
transfer image-level CLIP representations to videos? In this work, we show that
a simple Video Fine-tuned CLIP (ViFi-CLIP) baseline is generally sufficient to
bridge the domain gap from images to videos. Our qualitative analysis
illustrates that the frame-level processing from CLIP image-encoder followed by
feature pooling and similarity matching with corresponding text embeddings
helps in implicitly modeling the temporal cues within ViFi-CLIP. Such
fine-tuning helps the model to focus on scene dynamics, moving objects and
inter-object relationships. For low-data regimes where full fine-tuning is not
viable, we propose a `bridge and prompt' approach that first uses fine-tuning
to bridge the domain gap and then learns prompts on language and vision side to
adapt CLIP representations. We extensively evaluate this simple yet strong
baseline on zero-shot, base-to-novel generalization, few-shot and fully
supervised settings across five video benchmarks. Our code is available at
https://github.com/muzairkhattak/ViFi-CLIP.Comment: Accepted at CVPR 202
Harmonic Scalpel Hemorrhoidectomy Vs Milligan-Morgan Hemorrhoidectomy
Background: To compare Harmonic Scalpel Hemorrhoidectomy (HSH) with classical Milligan Morgan Hemorrhoidectomy (MMH) in terms of operation time and post-operative pain to establish effectiveness of this novel procedure.Methods: A total of 62 patients planned for excision hemorrhoidecotmy were randomly selected into HSH and MMH groups. Mean operation time was calculated during surgery and pain at time of first defecation was recorded on visual analog scale (VAS).Results: Mean VAS after surgery at time of first defecation was 4.32 (SD 0.909) in HSH group and 6.97 (SD 1.426) in MMH group (p value <0.000). Mean Operation time in HSH group was 18.13 (SD 3.956) minutes and that of MMH group was 22.90 (SD 4.901) minutes (P value <0.000).Conclusion: Harmonic Scalpel Hemorrhoidectomy is better than Milligan Morgan hemorrhoidectom
A Dynamically Consistent Nonstandard Difference Scheme for a Discrete-Time Immunogenic Tumors Model
This manuscript deals with the qualitative study of certain properties of an immunogenic tumors model. Mainly, we obtain a dynamically consistent discrete-time immunogenic tumors model using a nonstandard difference scheme. The existence of fixed points and their stability are discussed. It is shown that a continuous system experiences Hopf bifurcation at one and only one positive fixed point, whereas its discrete-time counterpart experiences Neimark–Sacker bifurcation at one and only one positive fixed point. It is shown that there is no chance of period-doubling bifurcation in our discrete-time system. Additionally, numerical simulations are carried out in support of our theoretical discussion.Spanish Government and European Commission, Grant RTI2018-094336-B-I00 (MCIU/AEI/FEDER, UE); Basque Government, Grant IT1207-19
Targeted Genome Editing for Cotton Improvement
Conventional tools induce mutations randomly throughout the cotton genome—making breeding difficult and challenging. During the last decade, progress has been made to edit the gene of interest in a very precise manner. Targeted genome engineering with engineered nucleases (ENs) specifically zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR) RNA-guided nucleases (e.g., Cas9) has been described as a “game-changing technology” for diverse fields as human genetics and plant biotechnology. In eukaryotic systems, ENs create double-strand breaks (DSBs) at the targeted DNA sequence which are repaired by nonhomologous end joining (NHEJ) or homology-directed recombination (HDR) mechanisms. ENs have been used successfully for targeted mutagenesis, gene knockout, and multisite genome editing (GenEd) in model plants and crop plants such as cotton, rice, and wheat. Recently, cotton genome has also been edited for targeted mutagenesis through CRISPR/Cas for improved lateral root formation. In addition, an efficient and fast method has been developed to evaluate guide RNAs transiently in cotton. The targeted disruption of undesirable genes or metabolic pathway can be achieved to increase quality of cotton. Undesirable metabolites like gossypol in cottonseed can be targeted efficiently using ENs for seed-specific low-gossypol cotton. Moreover, ENs are also helpful in gene stacking for herbicide resistance, insect resistance, and abiotic stress tolerance
- …